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The effectiveness of adaptive filters are mainly dependent on the design 
techniques and the algorithm of adaptation. The most common adaptation 
technique used is least mean square (LMS) due its computational simplicity. 
The application depends on the adaptive filter configuration used and are well 
known for system identification and real time applications. In this work, a 
modified delayed p-law proportionate normalized least mean square 
(DMPNLMS) algorithm has been proposed. It is the improvised version of the 
u-law proportionate normalized least mean square (MPNLMS) algorithm. 
The algorithm is realized using Ladner-Fischer type of parallel prefix 
logarithmic adder to reduce the silicon area. The simulation and 
implementation of very large-scale integration (VLSI) architecture are done 
using MATLAB, Vivado suite and complementary metal—oxide— 
semiconductor (CMOS) 90 nm technology node using Cadence RTL and 
Genus Compiler respectively. The DMPNLMS method exhibits a reduction 


in mean square error, a higher rate of convergence, and more stability. The 
synthesis results demonstrate that it is area and delay effective, making it 
practical for applications where a faster operating speed is required. 
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1. INTRODUCTION 

In the digital world there is a need for higher level of intelligence and accuracy. Digital circuits are 
the basic building blocks for any smart system and signal processing plays a vital role in deciding the 
performance of the circuits [1], [2]. In signal processing, filters are most usual circuits we can find. The need 
of filtering is remarkable; hence filtering is having lot of importance because of the noise presence [3]. 

Any noise can enter to the circuit in any means and can degrade the circuit performance. In order to 
make the circuit less sensitive to the noise, an efficient filter needs to be designed. The basic principle of filter 
is to filter any unwanted signal which should provide only the desired signal [4] which is required for circuit 
operation. The response of the filter with different noise is considered while evaluating a filter design. There 
are two types of filters, namely, finite impulse response (FIR) filters and infinite impulse response (IIR) filters. 
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As the name suggests, the output of FIR filter is finite and becomes zero after some time period, while for an 
IR filter, the output response is infinite [4], [5]. 

Any filter should adapt to the change in its operating environment, the filter which can adapt to the 
changes in operating environment is an adaptive filter. Adaptive filters are realized either with IIR and FIR 
filter where the coefficients of filter can be updated in order to get the desired signal. Hence, by varying the 
coefficients of the FIR filters according to the change in operating condition we can make the filter to adapt to 
the change in operating condition [6]. 

The key component of the adaptive filter is an algorithm, which updates the filter coefficients 
iteratively with respect to the changes in environment conditions. Least mean square (LMS) is one such 
algorithm which is used to mimic the response of the desired filter by estimating the filter coefficients that can 
produce the LMS of the error signal, where error signal is the difference between desired signal and input signal 
with noise [7]. LMS algorithm suffers fixed step size (u) parameter which leads to gradient noise amplification 
problem and it has weak convergence. To overcome these problems, normalized least mean square (NLMS) 
algorithm is used. NLMS algorithm offers normalized step size and modification of weight update with small 
positive number (€) which makes the NLMS performance better than LMS algorithm [8]. Many other 
algorithms such as least mean logarithmic square (LMLS) which combines the advantages of both LMS and 
least mean fourth (LMF) algorithms, least logarithmic absolute difference (LLAD) algorithm which offers 
advantages of LMS and sign LMS (SLMS) algorithm [7], [8]. 

Proportionate LMS (PLMS) algorithms are introduced in order to track the sparse impulse response 
faster. PNLMS give better performance than NLMS with faster convergence and improved mean square error 
(MSE) [9]. Delayed u-law proportionate normalized least mean square (DMPNLMS) is the proposed 
algorithm, which is an improvement over -law proportionate normalized least mean square (MPNLMS) 
algorithm. The remainder of the paper is organized as follows: section 2 describes the proposed architecture 
and the implementation of DMPNLMS algorithm, section 3 discusses the simulation results and section 4 is 
conclusion of the present work. 


2. PROPOSED DMPNLMS ARCHITECTURE 

The architecture of DMPNLMS filter is as shown in Figure 1. The input signal u(n) is fed into tap 
coefficients with each having an arithmetic delay of ‘X’ units. To introduce this delay ‘X’, unit delay registers 
are used. The output of tap coefficient is fed into parallel prefix logarithmic adder. The output of the adder is 
multiplexed with desired signal, which contains some erroneous. This signal is fed into the desired function 
block for the DMPNLMS filter residues. The loopback path is formed for continuous 'n' number of iterations 
due to adaption [10]—[12]. Thus, the architecture of the designed DMPNLMS filter consists of tap coefficient, 
parallel prefix logarithmic adder, desired function and a desired block. In adaptive filtering, the tap coefficient 
is crucial. It represents the weights used to create the filter's output from various input values. In order to 
improve the effectiveness of the filter, these coefficients are modified during the learning process. 


empnlms(—N ) 


Figure 1. DMPNLMS architecture 
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A crucial element for effective computation inside the filter is the parallel prefix logarithmic adder. 
To speed up filter processes, it executes arithmetic calculations, frequently in parallel. The filter's objective is 
defined by the desired function. It stands in for the desired result that the filter seeks to produce [13]-[16]. 
Each of the subcomponents are discussed in the upcoming sub sections. The coefficient update equation of the 
DMPNLMS is as shown in (1) which is slightly different from NLMS with the extra step size update matrix Q 
as (1). 


Q(n-N)x(n-N+1)e(n-n+1) 


hlin- N +1) =h- N) + NDOA NiD S Gps (1) 

The diagonal matrix controls the step size and is evaluated using (2) and (3). 
do(n) eae 0 
Q(n — N) = diag{qo(n —N),q,\(n—N),.--5Q,-10-D} =] : * : (2) 
0 “+ @g-4(n—1) 

The control matrix elements can be expressed as (3): 

ain) = TE 3) 
where, 

ky (n) = max{p * F(|*hy-1(n — N)|)},FUh@ — N)I) 

F(la(n — N)|) = SER 

(IAI) < 1 and y = = (4) 


the negative infinity at the initial stage is overcome by inserting a constant 1 in the logarithm function. The 

denominator function In(1 + u) normalizes F (|^h1(n)|) in the range [0, 1]. The value of € is a small positive 

number, and should be chosen such that it supports the background. € = 0.001 is a good choice as the echo 

below -60 dB is negligible. The general design methodology used in the current work is summarized as [17]— 

[20]: 

— The convergence rate and stability is done using a MATLAB code simulation. This solidifies the concept 
for the current and previous works. Using MATLAB simulation, the algorithm is verified for the correct 
functionality. 

— The field programmable gate array (FPGA) synthesis is carried out using Vivado Kintex-7 to implement 
this digital system. 

— Application specific integrated circuit (ASIC) synthesis is also carried out with area, timing and power 
parameter information. 


2.1. Tap coefficient 

The tap coefficient is the primary block of the DMPNLMS filter. It consists of an adder which is liable 
for adding the input values with error control block’s value, so as to boost the signal. The output of this adder 
is fed to an AND gate that performs “AND” function of the loop backed error control block output with the 
output of the adder [21]-[23]. The output of AND circuit is “OR” ed to introduce a delay. N number of cascaded 
OR gates are used to produces N delay unit. The input is then “AND” ed with the delayed output of the OR 
gate to provide tap coefficient output [24], [25]. 


2.2. Ladner-Fischer logarithmic adder 

In the current work, Ladner-Fischer adder is being used. It consists of black cell, gray cell and AO 
(AND-XOR) block. The black cell is accountable for generation and propagation. The gray cell is liable for 
generation alone. The black cell is the combination of two AND cells and one OR cell. It gives out two outputs, 
one from the AND gate which is the propagation signal and the other is from the OR gate, which is the generate 
signal. It is the combination of AND gate and OR gate. The output is solely the generate signal. Table 1 shows 
the comparison between the logic levels, area, fan out and wire length of different types of parallel prefix 
logarithmic adders. 
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Table 1. Comparison of different types of parallel prefix adders 


Types Logic level Area Fanout Wirelength 

n 

Kogge-Stone logan Nlogn-—1+1 2 7 
Brent-Kung 2logan—1 2n — logan- 2 2 1 
Ladner-Fischer log,n+1 (5) logon + di ZA 1 1 

I2 gee a A 

Han-Carlson logon 6) logan 2 = 
2 pee 4 


2.3. Desired function 

The pivotal arrangement of the filter's processing sequence positions the "desired function" 
immediately following the parallel prefix Ladner-Fischer logarithmic adder. Its primary role revolves around 
conducting subtraction operations, involving the deduction of the logarithmic adder's output from the initial 
input signal. This subtraction process forms the bedrock of the filter's adaptation mechanism by quantifying 
the disparity or deviation between the expected output, as characterized by the desired function, and the present 
output produced by the filter. 


2.4. Desired block and error control block 

The desired block is used to extract error. It performs the subtraction operation of the output of desired 
function block and the output of the parallel prefix Ladner-Fischer logarithmic adder. This is the block where 
the algorithm resides. It is responsible for the formation of loop back in the system [26]—[28]. 


3. RESULTS AND DISCUSSIONS 

Figure 2 shows the rate of convergence for SLMS, LLAD, LMLS, quantized kernel LMS (QKLMS), 
and NLMS filter algorithms. From the MATLAB simulations using Ladner-Fischer adder, the MSE of SLMS, 
LLAD, LMLS, QKLMS and NLMS filters are 12.24 dB, -36.91 dB, -43.26 dB, -34.38 dB and -42.52 dB 
respectively. The filter length used is 64 and the number of iterations ran are 4000. Figure 3 shows the rate of 
convergence for delayed-LMS (DLMS), delayed-LLAD (DLLAD), delayed-NLMS (DNLMS) and 
proportionate normalized LMS (PNLMS) filter algorithms, for a filter length of 64 and 4000 iterations, from 
the MATLAB simulations using the Ladner-Fischer adder, the MSE of DLMS, DLLAD, DNLMS, PNLMS 
filter are -39.14 dB, -45.18 dB, -48.07 dB and -52.84 dB respectively. 

Figure 4 shows the rate of convergence for delayed u-law proportionate LMS (DMPLMS), 
DMPLLAD, DMPLMLS and DMPNLMS filter algorithms, for a filter length of 64 and 4000 iterations. From 
the MATLAB simulations using the Ladner-Fischer adder, the MSE of DMPLMS, DMPLLAD, DMPLMLS, 
DMPNLMS filters are -57.01 dB, -59.12 dB, -63.27 dB and -67.24 dB respectively and Figure 5 shows the 
ASIC synthesized netlist for DMPNLMS architecture [26]—[30]. 

Table 2 shows the improvement seen in the current work with respect to MSE when compared to the 
different previous works DLMS [25], DLLAD [5], DNLMS [23], PNLMS [23], DMPLMS [5], DMPLLAD 
[5], DMPLMLS [5], DMPNLMS [5], and Table 3 shows the delay, area and power reports for filters with 
length 32 bit and 64 bit. The output response of the adaptive filter y(n) can be observed, the output of desired 
function e(n) is the result of subtraction of the output y(n) from desired output d(n) [31]-[34]. 


Rate of convergence of LMS Algorithms a Rate of convergence of LMS Algorithms 
8 
8 0 $ 
D y | B ai 
_— sok- Å k q Á 
0 so 1000 1500 2000 2500 3000 350 r 550 000 1520 2060 2500 3000 s0 
No of iterations No of iterations 
Figure 2. Rate of convergence of SLMS, LLAD, Figure 3. Rate of convergence of DLMS, 
LMLS, QKLMS, and NLMS algorithms DLLAD, DNLMS, and PNLMS algorithms 
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Rate of convergence of LMS Algorithms 
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Figure 4. Rate of convergence of DMPLMS, Figure 5. ASIC synthesized netlist for 
DMPLLAD, DMPLMLS, and DMPNLMS algorithm DMPNLMS architecture 


Table 2. Improvement on MSE for different algorithms 


Algorithm MSE (dB) Improvement (dB) 
DLMS [25] -39.14 4.94 
DLLAD [5] -45.18 8.28 

DNLMS [23] -48.07 5.57 
PNLMS [23] -52.84 9.34 
DMPLMS [5] -57.01 17.87 
DMPLLAD [5] -59.12 13.18 
DMPLMLS [5] -63.27 12.02 
DMPNLMS [5] -67.24 14.04 
DMPNLMS -67.54 14.10 


Table 3. Delay, area, and power reports for different algorithms with 32-bit and 64-bit filter lengths 


y Filter length Dela Total area ADP leakage power Dynamic power Total power 

Design (bits) Hovsterlls n. AD (um?) (um?*ns) (mW) y (mW) (mW) 
DNLMS 32 1,396 2.536 5 6,026 15,281 0.0324 0.0983 0.1307 
64 3,427 2.416 5 17,998 43,478 0.0422 0.1950 0.2372 

DMPLMS 32 1,432 13.256 6 9,248 122,073 0.0534 0.1215 0.1749 
64 18,265 22.563 6 19,445 437,512 0.1739 2.1385 2.3123 

DMPLLAD 32 4,526 23.985 7 8,628 582,758 0.0423 0.3225 0.3648 
64 18,924 46.731 7 109,614 5,405,820 0.1528 4.0814 4.2342 

DMPLMLS 32 5,428 24635 6 41,626 603,999 0.0432 0.9925 1.0357 
64 20,735 48.409 6 116,122 5,620,304 0.1750 4.6315 4.8065 

DMPNLMS 32 5,201 20.96 7 26,268 549,001 0.0523 0.3012 0.3535 
64 17,056 38.916 7 110,462 4,296,971 0.1772 3.9056 4.0828 


4. CONCLUSION 

The DMPNLMS algorithm shows a improvement in MSE, convergence rate and greater stability. The 
synthesis results show that it is area efficient and delay efficient, hence it becomes viable for applications where 
higher speed of operation is required. Proportional-type adaptive algorithms offer a substantial enhancement 
in the convergence performance of sparse adaptive filters when compared to the traditional LMS algorithm. 
Nevertheless, the significant computational burden associated with these algorithms presents a formidable 
challenge for their implementation in VLSI. In response to this challenge, we have put forth a number of 
modifications aimed at simplifying the original proportionate-type normalized LMS (Pt-NLMS) algorithms. 
We have also introduced efficient VLSI designs tailored to these modified algorithms. Among our proposals, 
the DMPNLMS stands out as a robust VLSI solution. We believe that our research will serve as a catalyst for 
other researchers to explore more efficient hardware solutions, thus advancing the capabilities of sparse 
adaptive filter architectures through the use of streamlined arithmetic circuits. 
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